Signature selection forces and evolutionary divergence of immune-survival genes compared between two important shrimp species

In recent years, shrimp aquaculture industry had grown significantly to become the major source of global shrimp production. Despite that, shrimp aquaculture production was impeded by various shrimp diseases over the past decades. Interestingly, different shrimp species demonstrated variable levels of immune strength and survival (immune-survival) ability towards different diseases, especially the much stronger immune-survival ability shown by the ancient shrimp species, Macrobrachium rosenbergii compared to other shrimp species. In this study, two important shrimp species, M. rosenbergii and Penaeus monodon (disease tolerant strain) (uninfected control and VpAHPND-infected) were compared to uncover the potential underlying genetic factors. The shrimp species were sampled, followed by RNA extraction and cDNA conversion. Five important immune-survival genes (C-type Lectin, HMGB, STAT, ALF3, and ATPase 8/6) were selected for PCR, sequencing, and subsequent genetics analysis. The overall genetic analyses conducted, including Analysis of Molecular Variance (AMOVA) and population differentiation, showed significant genetic differentiation (p<0.05) between different genes of M. rosenbergii and P. monodon. There was greater genetic divergence identified between HMGB subgroups of P. monodon (uninfected control and VpAHPND-infected) compared to other genes. Besides that, based on neutrality tests conducted, purifying selection was determined to be the main evolutionary driving force of M. rosenbergii and P. monodon with stronger purifying selection exhibited in M. rosenbergii genes. Potential balancing selection was identified for VpAHPND-infected HMGB subgroup whereas directional selection was detected for HMGB (both species) and ATPase 8/6 (only P. monodon) genes. The divergence times between M. rosenbergii and P. monodon genes were estimated through Bayesian molecular clock analysis, which were 438.6 mya (C-type Lectin), 1885.4 mya (HMGB), 432.6 mya (STAT), 448.1 mya (ALF3), and 426.4 mya (ATPase 8/6) respectively. In conclusion, important selection forces and evolutionary divergence information of immune-survival genes between M. rosenbergii and P. monodon were successfully identified.

Therefore, in this study, we aim to investigate the important immune genetic differences between disease tolerant P. monodon and M. rosenbergii shrimps. The experimental focus was given to five selected vital immune genes, including C-type Lectin, HMGB, STAT, ALF3, and ATPase 8/6. Each of these genes has crucial function in the shrimp innate immune response and survival. Genetic sequences were then obtained for these immune-survival genes and subsequently used for evolutionary genetics analyses between P. monodon and M. rosenbergii. The signature genetic divergence, selection forces, and divergence times were successfully determined.

Shrimp sample preparations
Macrobrachium rosenbergii juvenile prawns (15-20 cm body length) were acquired from a local prawn farm. A total number of 50 prawns were dissected with the hepatopancreas samples collected and stored for subsequent downstream applications. On the other hand, similarly, disease-tolerant crossbred (13 th generation Madagascar strain crossed with 5 th generation local strain) Penaeus monodon juvenile shrimps (15-20 cm body length) were obtained from a local shrimp farm. The shrimps were then kept aseptically for seven days acclimatization (27 shrimps in each tank) in aerated artificial seawater (30 ppt) at 28 ± 1.0˚C.

Vp AHPND experimental challenge
2.2.1 Pathogen preparations. P. monodon shrimps suspected with AHPND outbreaks were acquired and validated through gross clinical sign observation and AP3 PCR detection method [32]. The selective propagation of Vp AHPND bacteria [33] was then done using digestive organs of Vp AHPND -infected shrimps through a series of incubations in tryptic soy broth (TSB+), thiosulfate citrate bile salt (TCBS) agar, and tryptic soy agar (TSA+). The propagated bacteria were stored in cryovials (CRYOBANK™) at -80˚C for usage in downstream experiments.

Pre-challenge preparations and experimental challenge.
The acclimatized P. monodon shrimps were negatively screened using AP3 PCR method [32] for validation to be Vp AHPND -free prior to the experimental challenge. During the Vp AHPND experimental challenge, the P. monodon shrimps were infected with Vp AHPND bacteria (KS17.S5-1 positive strain) (2 × 10 6 cfu/ml) [34] using a modified immersion method [10]. The negative control group shrimps were added with sterile TSB+ broth instead of Vp AHPND bacteria. The shrimp hepatopancreas samples were obtained at 0, 3, 6, 12, 24, 36, and 48 hours post-infection (hpi) and stored at -80˚C. The experimental challenge details were also described in a previous publication [33].
All shrimp tanks were equipped with water filters and aerators. The experimental challenge involved three biological replicates for each treatment and control group. The challenged P. monodon shrimps were positively screened using AP3 PCR method [32] to confirm the presence of Vp AHPND bacteria. The ethical approval was given by University of Malaya for the study within its facilities (Ethical Application Ref: S/31012019/26112018-05/R).

Total RNA extraction and first strand cDNA synthesis
Total RNA samples were extracted from M. rosenbergii prawn and P. monodon shrimp hepatopancreas using TransZol Up Plus RNA Kit (TransGen Biotech, Beijing, China). The extracted RNA samples were then used for first-strand cDNA synthesis and DNA contaminant removal using TransScript1 One-Step gDNA Removal and cDNA Synthesis SuperMix (TransGen Biotech, Beijing, China). The manufacturer's protocols were followed for the kits used. The cDNA samples acquired were stored and utilized in downstream experiments.

PCR and sequencing
Primer design was conducted using Primer Quest Tool software (https://sg.idtdna.com/ Primerquest/home/Index) by referring to nucleotide sequences either downloaded from NCBI database or selected from previously acquired RNA-Seq analysis data (Differentially Expressed Genes sequences). The detailed information of designed and used primers were provided in S1A & S1B Table. The cDNA sequences of selected immune genes (C-type Lectin, HMGB, STAT, ALF3, ATPase 8/6) were determined through PCR amplification and Sanger Sequencing. A range of 30 to 50 individual sequences were obtained for each of the immune genes for both M. rosenbergii and P. monodon. Major regions of the gene sequences (>70%) particularly conserved domains were amplified using the designed primers to ensure validity and importance of the sequences obtained.
Subsequently, the PCR product validation was done through agarose gel electrophoresis analysis, which involved Vivantis 100 bp DNA Ladder/100 bp Plus DNA Ladder (Vivantis Technologies, Selangor, Malaysia) and Major Science MP-300V/MBE-150 device (Major Science, California, USA). The validated PCR products were utilized for Sanger Sequencing analysis. The sequences obtained were then trimmed and compiled using MEGA7 [35] and Chromas (https://technelysium.com.au/wp/chromas/) software. The compiled sequences were used in subsequent genetics analyses.

Evolutionary genetics analysis
The acquired cDNA sequences were compared and analysed through different software, including Arlequin (Version 3. Besides that, DNA divergence analysis was done through DnaSP software [37]. The number of conserved or diverged sites were also determined at both nucleotide and amino acid levels through MEGA7 software [35]. Subsequently, genetic differentiation display and PCA analysis were plotted using R software [38] for total cDNA sequences comparison (based on genetic distances). The software packages applied were ape, reshape2, and ggplot2 (genetic differentiation display) and ape, ggfortify, and ggplot2 (PCA analysis) respectively.

Phylogenetic and Bayesian molecular clock dating analyses
For further genetic divergence analysis, the acquired cDNA sequences were checked with JMo-delTest2 software [39, 40] to identify the optimal models of nucleotide substitution. The optimal nucleotide models were determined based on the best delta values of Akaike and Bayesian information criteria (AIC and BIC) for phylogenetic analysis and Bayesian analysis respectively. The total cDNA sequences were utilized for phylogenetic analysis using MEGAX software [41] through Maximum likelihood method (Number of bootstraps: 1000). Besides that, Bayesian Molecular Clock Dating analysis was also conducted using BEAST2 software [42] with relaxed clock model [43] (Chain length: 10000000 or before infinity and ESS>200) in this study. Different cDNA gene sequences including C-type Lectin, HMGB, STAT, ALF3, and ATPase 8/6 were involved. The genetic divergence time estimations were done by referring to a previously constructed time tree involving crabs, lobsters, and shrimps [44]. The estimated divergence time information between M. rosenbergii prawn family, Palaemonidae and P. monodon shrimp family, Penaeidae was utilized as the reference for the genetic divergence time estimation in this study. The prediction of divergence time, ancient divergence period, and evolutionary event was achieved through the Bayesian analysis.

Genetic conservation or divergence analysis
Generally, Penaeus monodon and Macrobrachium rosenbergii gene sequences, which comprise five immune genes (C-type Lectin, HMGB, STAT, ALF3, and ATPase 8/6), were used for the genetic conservation or divergence analysis. A range of 30 to 50 sequences were utilized for each of the immune genes from both P. monodon and M. rosenbergii. The genetic conservation or divergence was initiated with AMOVA analysis to identify the degree of genetic differentiation as shown in (Table 1A-1C). From the AMOVA results, significant genetic differentiations (p<0.05) were determined between different genes, species-based combined gene sequences, and inter-species comparison within individual genes. The AMOVA results were supported by subsequent Population Matrix Differentiation and Population Pairwise FSTs analyses conducted. Based on the Population Matrix Differentiation analysis conducted (S2A Table), there were significant genetic differentiations (p<0.05) between different gene sequences of P. monodon and M. rosenbergii. This finding was validated through Population Pairwise FSTs analysis (S3 Table). Intriguingly, a subgroup comparison between Vp AHPND -infected and uninfected control P. monodon gene sequences (S2B Table) showed significant genetic differentiations (p<0.05) for the C-type Lectin, HMGB, and ALF3 genes. Among these genes, HMGB had the highest level of genetic differentiation between its subgroups. This finding was validated by the DNA divergence analysis as shown in S4 Table. The number of conserved and diverged sites (S5A & S5B Table) were counted at both nucleotide and translated amino acid levels in which major divergence was found at 5' and 3' regions. In addition, for individual gene sequence variations, M. rosenbergii C-type Lectin showed the highest divergence whereas P. monodon STAT showed the highest conservation. These significant genetic differentiations were supported and visualized by further analyses of genetic differentiation display (Fig 1) and PCA (Fig 2). Overall, from the evolutionary genetics analysis done, it can also be identified that the gene with the highest inter-species variance was STAT whereas the gene with the lowest inter-species variance was C-type Lectin.

Selection forces
The same gene sequences from the genetic conservation or divergence analysis were further utilized for the analysis of selection forces. Neutrality tests, including Ewens-Watterson, Tajima's D, and Fu's Fs (Table 2), were successfully conducted to analyse the selection forces acting on different genes of P. monodon and M. rosenbergii together with different subgroups of P. monodon. The results revealed that strong purifying selection force was associated with immune genes of both shrimp species. Overall, M. rosenbergii had stronger purifying selection forces detected compared to P. monodon except for the P. monodon STAT gene which possessed the strongest purifying selection force. Balancing selection force was detected for P. monodon C-type Lectin gene uninfected control subgroup. Interestingly, P. monodon HMGB gene Vp AHPND -infected subgroup showed balancing selection force as well whereas its uninfected control subgroup showed purifying selection force. Besides that, directional selection forces were identified in M. rosenbergii HMGB and ATPase 8/6 genes along with P. monodon HMGB and STAT genes.

Evolutionary divergence analyses
The total cDNA sequences used in the genetic and selection force analyses were analysed for evolutionary divergence through phylogenetic and Bayesian analyses. A maximum likelihoodbased phylogenetic analysis was conducted using the total cDNA sequences as shown in Fig 3. From the phylogenetic analysis, it can be clearly observed that different gene sequences of P. monodon and M. rosenbergii were clustered accordingly. The gene sequences of C-type Lectin from the two shrimp species were grouped together in the closest distance followed by ALF3 gene sequences compared to other gene sequences. In addition, the genetic divergence times followed by event estimations of these immune genes between P. monodon and M. rosenbergii were determined through Bayesian Molecular Clock Dating method (Fig 4A-4E)

PLOS ONE
Selection forces, evolutionary divergence, and immune-survival genes (2500 mya to 542 mya). Hence, a higher mutation rate was implied for the HMGB gene compared to the other genes which resulted in the very much earlier estimated divergence time in the Proterozoic era. The expected divergence time of the two shrimp species was indicated to be within the Paleozoic era as inferred from the estimated divergence times of the other genes obtained. The mentioned time periods were referred to a previously reported geological time scale [45,46].

Evolutionary genetic variations and selection forces
Evolutionary genetic variations were demonstrated through AMOVA analysis results which showed that the compared immune genes are significantly different from each other in the same shrimp species or between Macrobrachium rosenbergii and Penaeus monodon. The significant inter-species genetic differentiations inferred that P. monodon and M. rosenbergii are distinctly separated in terms of genetic evolution. This finding is supported by a previously constructed prophenoloxidase phylogeny tree in which M. rosenbergii did not cluster together with P. monodon or other Penaeidae shrimps [47]. Notably, P. monodon HMGB possessed the highest subgroup genetic divergence (between uninfected control and Vp AHPND -infected) followed by C-type Lectin and ALF3 compared to other genes. This observation was deduced to be caused by the occurrence of gene or transcript sequence alterations during AHPND infection for adaptation purpose due to the selection force exerted by the invading pathogen. A similar example would be the proposed host

PLOS ONE
Selection forces, evolutionary divergence, and immune-survival genes immune adaptation mechanisms triggered by HIV infection involving DNA methylation and alternative splicing, which then affected immune gene expression [48].
At individual species gene level, M. rosenbergii C-type Lectin had the highest level of variance whereas P. monodon STAT had the lowest level of variance. The individual species gene sequence variance is affected by the occurrence of Single Nucleotide Polymorphisms (SNPs) related to disease resistance [49]. The highest sequence variance shown in M. rosenbergii Ctype Lectin gene was validated by the previously identified high sequence variations, additional

PLOS ONE
Selection forces, evolutionary divergence, and immune-survival genes domains, and diverse binding ability of the shrimp C-type Lectin due to its importance in immune recognition [50].
On the other hand, the lowest sequence variance shown in P. monodon STAT gene was postulated to be the effect of the highly conserved JAK/STAT pathway core components and functions in evolution [51,52] as exemplified by the previously deduced grouping of the shrimp STAT into ancient STAT family [53]. These individual species gene variations were also associated with the reversed variance level observations at the inter-species level.
Genetic divergence is commonly found outside of conserved domains or translated regions. The highest degrees of divergence discovered in the 5' and 3' gene regions were inferred to be the effect of selection forces. This inference was supported by the identification of positive and negative selection forces that influenced the 5' UTRs, 3' UTRs, and cis-regulatory regions which then caused sequence variations in humans [54]. The vital function of 3' UTRs in posttranscriptional regulation contributes to gene expression control and adaptive divergence [55].
The strong purifying selection force detected on the immune genes was postulated to be the main factor that caused the stronger immune response of M. rosenbergii in the recognition and elimination of invading pathogens. The P. monodon STAT gene with the strongest purifying selection force detected was also suggested to be crucial in the disease tolerance of the challenged P. monodon used in this study. Purifying selection is the most crucial evolutionary force of immune genes [25,[56][57][58] to achieve broad recognition of pathogens perhaps through conserved motifs [25]. Purifying selection was also inferred to be essential in the RNA sequence regulation for the maintenance of the correct RNA or protein functionalities in the innate immune system as shown by the previous discovery of its importance in the regulation of long dsRNA through RNA editing [59].
The P. monodon HMGB gene was postulated to be influenced by the balancing selection force formed during AHPND infection. Balancing selection plays vital role in the adaptive evolution across innate immune genes, particularly those involved in pathogen recognition. Immune genes showed molecular signatures of adaptation in response to various pathogens and associated selective pressures [60][61][62]. Besides that, directional selection was suggested to be an important evolutionary selection force acting on M. rosenbergii and P. monodon immune genes. The influence of directional selection on a Major Histocompatibility Class I marker in Rana temporaria was vital to achieve adaptation to Ranavirus [63]. A variable model of selection had also been determined for host-pathogen coevolution, which can be either balancing or directional selection for Daphnia pulex [61].

Evolutionary divergence and time estimation
The immune gene sequences of M. rosenbergii and P. monodon were compared and grouped accordingly in the phylogenetic analysis. Interestingly, M. rosenbergii and P. monodon C-type Lectin gene sequences were clustered closest to each other followed by ALF3 gene sequences compared to other gene sequences. This observation was postulated to be caused by the lowest level of divergence between M. rosenbergii and P. monodon C-type Lectin cDNA sequences among the compared gene sequences. C-type Lectin-like domains not only display recognition and functional versatility, but also possess highly conserved disulphide bridges associated with conserved hydrophobic and polar interactions [64]. The inter-species gene sequence conservation of C-type Lectin was supported by the previous demonstrations across different holometabolous insect species [65] and phylogenetically distinct viper species [66].
Besides that, similar inference of low ALF3 sequence divergence level between M. rosenbergii and P. monodon was also made to be accountable for the observed ALF3 gene sequences clustering. The inter-species gene sequence conservation of ALF3 was highlighted by the previously identified importance of conserved functional domain together with associated amino acid property and disulphide bond in maintaining the strong antibacterial activity of shrimp ALF [67]. Subsequently, the higher levels of inter-species evolutionary conservation shown in C-type Lectin and ALF3 genes compared to other immune genes were suggested to be the effect of their similar functional importance in the recognition and elimination of invading pathogens. This inference was validated by an example of the evolutionary conservation of a DExD/H-box helicase called DDX23 that plays vital role in the innate immune recognition of an RNA virus [68].
In addition, the estimated divergence times of C-type Lectin, STAT, ALF3, and ATPase 8/6 genes between M. rosenbergii and P. monodon were determined to be within the time range of 426.4 million years ago (mya) to 448.1 mya except for HMGB gene which was estimated at 1885.4 mya. These estimated divergence times were mostly grouped within the Paleozoic era (542 mya to 251 mya) [45,46]. The main evolutionary event inferred to be responsible for these genetic divergences was the Cambrian Explosion. The Cambrian Explosion that happened around 520 mya to 515 mya caused the emergence of many different life forms on Earth, particularly those with hard body parts and eyes [69]. After the Cambrian animal diversification, the Ordovician and Silurian periods within the Paleozoic era were the crucial time periods of land colonization by arthropods and tetrapods [46].
The involvement of the oldest common ancestor of M. rosenbergii and P. monodon in the estimated genetic divergence was inferred as the oldest identified shrimp fossil can be traced back to the Devonian period (416 mya to 359.2 mya) [70]. Based on the fossil record, the occurrence of lineage divergence within the Penaeid shrimp superfamily, Penaeoidea was estimated to be around late Permian period (before 250 mya) [71]. This finding supported the estimated genetic divergence times between M. rosenbergii and P. monodon in the time periods before Permian period in this study. Furthermore, the deduced divergence time of the two shrimp species was shown to be within the Paleozoic era based on the estimated divergence times of the immune genes obtained. Intriguingly, a higher mutation rate was postulated for the HMGB gene evolution compared to the other genes which resulted in the very much earlier estimated divergence time in the Proterozoic era. The evolutionary mutation rates of different immune genes might be time dependent as exemplified by the previously demonstrated time dependency of human mitochondrial DNA mutation rate estimates [72]. Thus, the higher mutation rate of HMGB gene inferred might be caused by a unique evolutionary event that occurred in a particular time period.

Conclusion
In conclusion, significant genetic differentiations were identified among the immune-survival genes compared between Macrobrachium rosenbergii and Penaeus monodon, including C-type Lectin, HMGB, STAT, ALF3, and ATPase 8/6. For individual species genes, M. rosenbergii Ctype Lectin showed the highest level of sequence divergence whereas P. monodon STAT showed the highest level of sequence conservation. For neutrality tests conducted, purifying selection was identified to be the main evolutionary force acting on the immune-survival genes of M. rosenbergii and P. monodon with overall stronger purifying selection force determined for M. rosenbergii immune-survival genes. The strongest purifying selection force was found on P. monodon STAT gene. Balancing selection was also determined to have occurred and acted on the P. monodon HMGB gene during AHPND infection. In addition, directional selection forces were identified for M. rosenbergii HMGB and ATPase 8/6 genes together with P. monodon HMGB and STAT genes. Based on the phylogenetic analysis, the C-type Lectin gene sequences were determined to be clustered closest between M. rosenbergii and P. monodon followed by ALF3 gene sequences compared to other genes. Furthermore, the Bayesian analysis successfully revealed estimated divergence times of C-type Lectin (438.6 mya), HMGB (1885.4 mya), STAT (432.6 mya), ALF3 (448.1 mya), and ATPase 8/6 (426.4 mya) genes between M. rosenbergii and P. monodon. The estimated divergence times were mostly within the Paleozoic era except for the unique estimated divergence time of HMGB in the Proterozoic era.
Evolutionary genetics analyses, including genetic differentiations, evolutionary selection forces, and estimation of divergence times were conducted for the immune-survival genes between M. rosenbergii and P. monodon. The important information uncovered from these evolutionary genetics analyses can bring deeper insight into shrimp genetic studies and potentially be referred by other researchers in similar field for experimental applications. Further evolutionary genetics analysis can also be conducted across many other shrimp species. The PCR primers designed in this study covering the conserved regions of immune-survival genes can be developed as shrimp health or immune activation diagnosis biomarkers. The developed biomarkers can then be utilized by shrimp farmers to minimize shrimp disease outbreaks. These biomarkers can assist in shrimp farm management and enhance precision aquaculture. The importance of biomarker application in shrimp aquaculture was emphasized by previous studies involving immune activation [73], temperature stress response [74], and environmental stress response [75]. The genetic information obtained can also assist in the development of stronger selective breeding programs to produce genetically enhanced shrimp lines with disease resistance or tolerance abilities.